Inferring Demographic Attributes of Anonymous Internet Users

نویسندگان

  • Dan Murray
  • Kevan Durrell
چکیده

Today it is quite common for web page content to include an advertisement. Since advertisers often want to target their message to people with certain demographic attributes, the anonymity of Internet users poses a special problem for them. The purpose of the present research is to find an effective way to infer demographic information (e.g. gender, age or income) about people who use the Internet but for whom demographic information is not otherwise available. Our hope is to build a high-quality database of demographic profiles covering a large segment of the Internet population without having to survey each individual Internet user. Though Internet users are largely anonymous, they nonetheless provide a certain amount of usage information. Usage information includes, but is not limited to, (a) search terms entered by the Internet user and (b) web pages accessed by the Internet user. In this paper, we describe an application of the Latent Semantic Analysis (LSA) [1] information retrieval technique to construct a vector space in which we can represent the usage data associated with each Internet user of interest. Subsequently, we show how the LSA vector space enables us to produce demographic inferences by supplying the input to a three-layer neural model trained using the scaled conjugate gradient (SCG) [9] method. 1: Introduction 1.1: The Problem The Internet attracts a large number of users and thus holds great potential as an advertising medium. Today it is quite common for web page content to include an advertisement. Since advertisers often want to target their message to people with certain demographic attributes, the anonymity of Internet users poses a special problem for them. The purpose of the present research is to find an effective way to infer demographic information about people who use the Internet but for whom demographic information is not otherwise available. Our hope is to build a highquality database of demographic profiles covering a large segment of the Internet population without having to survey each individual Internet user. The specific scientific and technological objectives of the research described in this paper are as follows: • To establish the possibility of inferring up to six demographic facts (sex, age, income, marital status, level of education and presence of children in the home) with a minimum 60% statistical confidence for a subset of the Internet population. • To establish the possibility of making at least one demographic inference about a subset of web users who account for more than 50% of the Internet traffic observed at a major website (e.g. a popular web search engine). • To establish the possibility of making demographic inferences in real-time, rather than building profiles of web users a priori. This would allow us to generate inferences on an as-needed basis using the most recent usage data. The research effort was successful on the first two points while work on the final objective is ongoing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inferring Demographics and Social Networks of Mobile Device Users on Campus From AP-Trajectories

Exploring demographics and social networks of Internet users are widely used for many applications such as recommendation systems. The popularity of mobile devices (e.g., smartphones) and location-based Internet services (e.g., Google Maps) facilitates the collection of users’ locations over time. Despite recent efforts to predict users’ attributes (e.g., age and gender) and social networks bas...

متن کامل

Privacy-Preserving Predicate Proof of Attributes with CL-Anonymous Credential

The anonymous credential system allows users to convince relying parties the possession of a credential released by an issuer. To adhere to the minimal information disclose principle, the anonymous credential facilitates predicate proofs of attributes without revealing the values. In this paper, we extend the pairing-based CL-anonymous credential system and present a series of attributes proof ...

متن کامل

Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast

We examine communications in a social network to study user emotional contrast – the propensity of users to express different emotions than those expressed by their neighbors. Our analysis is based on a large Twitter dataset, consisting of the tweets of 123,513 users from the USA and Canada. Focusing on Ekman’s basic emotions, we analyze differences between the emotional tone expressed by these...

متن کامل

A Comparative Study of Demographic Attribute Inference in Twitter

Social media platforms have become a major gateway to receive and analyze public opinions. Understanding users can provide invaluable context information of their social media posts and significantly improve traditional opinion analysis models. Demographic attributes, such as ethnicity, gender, age, among others, have been extensively applied to characterize social media users. While studies ha...

متن کامل

Author gender identification from text using Bayesian Random Forest

Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999